An open-source molecular builder and free energy preparation workflow

Automated free energy calculations for the prediction of binding free energies of congeneric series of ligands to a protein target are growing in popularity, but building reliable initial binding poses for the ligands is challenging. Here, we introduce the open-source FEgrow workflow for building user-defined congeneric series of ligands in protein binding pockets for input to free energy calculations. For a given ligand core and receptor structure, FEgrow enumerates and optimises the bioactive conformations of the grown functional group(s), making use of hybrid machine learning/molecular mechanics potential energy functions where possible. Low energy structures are optionally scored using the gnina convolutional neural network scoring function, and output for more rigorous protein–ligand binding free energy predictions. We illustrate use of the workflow by building and scoring binding poses for ten congeneric series of ligands bound to targets from a standard, high quality dataset of protein–ligand complexes. Furthermore, we build a set of 13 inhibitors of the SARS-CoV-2 main protease from the literature, and use free energy calculations to retrospectively compute their relative binding free energies. FEgrow is freely available at https://github.com/cole-group/FEgrow, along with a tutorial.

The authors describe FEgrow, a software workflow for the structure preparation and energy preevaluation of protein ligand complexes aimed at automated relative binding free energy calculations. The software is a welcome contribution to the field and the manuscript is overall well written and presented.
I have a number of questions and concerns that should be addressed in a revised version of the paper.
First, it is mentioned that in the geometry optimisation phase, van der Waals radii are scaled to 0.8 and charges by 0.5. How was this choice calibrated/validated? Concerning crystal water in BACE(hunt), the manuscript states: "Accurately including the energetics and effects on binding affinity of displacing water networks in hydrated binding pockets is evidently still a challenge." How is this accounted for in FEgrow? As this is a critical phase in the structure preparation task that FEgrow is designed for, an effort should be made to realistically model internal water molecules, for example in a retrospective study by matching known crstallographic water positions.
The manuscript states: "The network of alchemical transformations was built manually to include cycle closures for error analysis, and is shown in Figure S3." Which deviations from zero were obtained?
In the methods section it is not mentioned which charge model was used for GAFF2? Also, a 10A cutoff was mentioned for the simulations. Is that a plain cut-off? For electrostatic interactions this is known to produce artifacts. Please comment.
Reviewer #2 (Remarks to the Author): The manuscript of Bieniek et al. concerns the workflow (FEgrow) for building ligand poses in protein binding sites by growing functional groups from a core compound. The workflow is designed to set up input structures for relative binding free energy (RBFE) calculations. Since RBFE calculations based on molecular dynamics (MD) or Monte Carlo (MC) were shown to provide accurate predictions of proteinligand binding affinity, their use in drug design projects is becoming increasingly common. Set-up of input structures for RBFE calculations is a crucial step because incorrectly prepared input structures can result in either failure of calculations or erroneous binding free energy predictions. Since manual preparation of input structures for a large number of RBFE calculations can be time-consuming, the automated set-up of RBFE calculation input is an important and relevant task. Therefore, I believe that the paper will be of interest for researchers dealing with computer-aided drug design. Since the FEgrow workflow is an open-source project and freely available, I believe that the paper will facilitate new advances in de novo drug design and protein-ligand binding free energy calculations. The authors demonstrated the applicability and efficiency of their workflow by performing two case studies. In the first case study, FEgrow workflow was applied to 10 targets from the protein-ligand benchmark set to regrow congeneric ligands in the binding pockets. In the second case study FEgrow workflow was applied to generate input structures for RBFE calculations for a set of 13 congeneric inhibitors for the SARS-CoV-2 main protease. Average errors between binding free energies computed by FEgrow and experiment are acceptable for both case studies. RBFE calculations demonstrated sufficient correlation and low error with respect to experiment. The reported results provide convincing evidence of the applicability and efficiency of FEgrow workflow.
The manuscript is clearly written and well organized. The literature on de novo drug design and free energy calculations is summarized appropriately. The workflow design and obtained results are described consistently, coherently and with sufficient detail. I believe that the manuscript should be accepted with revisions.
The following are my comments and critique: 1. In the introduction, the authors mention several software packages capable of generating analogs of a hit compound by adding chemical functional groups to a rigid ligand core. These packages, therefore, have the same functionality as the workflow presented (FEgrow). The authors do not mention whether E-novo, Frag Explorer, DeepFrag, and Develop packages were used for set-up of input structures for RBFE calculations. If they can be used for this purpose, then it is unclear what are the advantages of FEgrow over these packages.
2. For SARS-CoV-2 main protease, RBFE of perampanel analogs were previously computed by Jorgensen research group using a workflow (BOMB/MCPRO) similar to FEgrow workflow. The authors compare their RBFE calculations with experiment but not with RBFE calculations reported by Jorgensen group. Since both workflows are based on the same approach for preparation of input ligand structures for free energy calculations, I believe the comparison of FEgrow/SOMD and BOMB/MCPRO would be appropriate. In particular, it is worth mentioning which workflow achieved better agreement between generated ligand structures with crystal structures and better accuracy of free energy calculations with respect to the experiment.
3. "Input and Constrained Conformer Generation" sections states that distance constraints to the common core are used for the energy minimization of generated conformers. The authors did not indicate which functon(e. g. root mean square distance) is used for these constraints.
4. According to "Geometry Optimisation" section, protein missing atoms, residues and hydrogen atoms are added using PDBFixer. Given that, it is unclear why the authors used Modeller and Chimera for the addition of missing residues and hydrogen atoms for SARS-CoV-2 main protease structure (PDB ID: 7L10) as indicated in the "Computational Methods" section. 5. In "Geometry Optimisation" section, paragraph 3 states: "The simulation uses the AMBER FF14SB force field …". I assume that "simulation" here applies to energy minimization mentioned in the previous sentence. The term "simulation" may be misleading: given that this is the first sentence of the paragraph, one can think that the simulation is a separate step following energy minimization. Therefore, I suggest either replacing "simulation" with "energy minimization" or edit this and previous sentence in such a way that the connection between these two terms is explicit. 6. "Case Study I: Protein-Ligand Benchmarks" section, paragraph 3: references to the Fig. 2 (a) and (b) for TYK2 and Thrombin systems are needed. 7. Table S1 in Supporting Information: for each 2D structure of common core, I suggest highlighting atoms that were used as attachment points for growing functional groups (e. g. as it was done in figure 1 in the main text). This will make this figure more informative for the illustration of case study 1.
8. Both case studies: to quantitatively characterize overall agreement of ligand poses generated by FEgrow workflow with those presented in crystal structures, I suggest calculating root mean square deviation (RMSD) of functional groups of FEgrow generated ligand structures with respect to corresponding crystal structures. This quantity can be computed for each pair of structures (generated -crystal) and then averaged for all structures for the same protein target. Thus, one can estimate how accurate FEgrow can reproduce ligand poses in crystal structures in general and also compare performance of FEgrow for different protein targets.
9. Some details of free energy simulations are not presented neither in "Computational Methods" section nor in Supporting Information. The missing details for the equilibration step are: simulation time, restraints applied, cutoff distance for nonbonded interactions, type of barostat and thermostat used. The missing details for the production step are: statistical ensemble, type of barostat and thermostat used. Also, the authors do not describe how the errors of free energies were calculated.
The authors describe FEgrow, a software workflow for the structure preparation and energy pre-evaluation of protein ligand complexes aimed at automated relative binding free energy calculations. The software is a welcome contribution to the field and the manuscript is overall well written and presented.
I have a number of questions and concerns that should be addressed in a revised version of the paper.
Response: Thank you very much for your comments on the manuscript and helpful suggestions for improvement, which we address below.
1) First, it is mentioned that in the geometry optimisation phase, van der Waals radii are scaled to 0.8 and charges by 0.5. How was this choice calibrated/validated?

Response:
This choice was made based on recommended settings in the BOMB de novo design software (which are unfortunately unpublished), and physical intuition. Since the optimisation stage is performed with a rigid binding site, it is desirable to include conformers that could be reasonably accommodated by some small adjustments in the receptor during subsequent MD simulations. Similarly, it is reasonable to include a factor to mimic the dielectric response that would be used to screen charges in the presence of a flexible binding site and solvent.
It is challenging to unequivocally validate this choice, but we have added a further analysis in the revised Supporting Information. Namely, for the thrombin target in Case Study I, we regenerate the structures of the R-groups and compute their gnina CNN binding affinities for a selection of Lennard-Jones scaling factors (0.8-1.0) and relative permittivities (1-4).
Interestingly, the correlation between predicted and experimental binding affinities is largely unchanged, but with no scaling the predicted affinities move to more negative values. The structure is also closer to experiment in the absence of scaling. We discuss these data further in the revised Section S1. We do prefer to keep the scaling behaviour as default in FEgrow, to allow for ligands to be more easily accommodated in the binding pocket, but also give users the option to change these values at run time.
2) Concerning crystal water in BACE(hunt), the manuscript states: "Accurately including the energetics and effects on binding affinity of displacing water networks in hydrated binding pockets is evidently still a challenge." How is this accounted for in FEgrow? As this is a critical phase in the structure preparation task that FEgrow is designed for, an effort should be made to realistically model internal water molecules, for example in a retrospective study by matching known crystallographic water positions.
Response: Crystal water molecules may be optionally retained in FEgrow, but only as part of the rigid receptor. That is, crystal water coordinates may be included in the input receptor structure by the user, and they will remain fixed during the geometry optimisation and scoring phases. We have expanded our description of this approach in the Geometry Optimisation section, in the revised manuscript.
Concerning the more general problem of how to include the effects of displaced water networks in pose generation and scoring, there is unfortunately no good solution (to the best of our knowledge). For example, one could imagine including extra water molecules as part of the ligand's flexible R-group, but how would one compare energetics of poses with differing numbers of water molecules? How many water molecules should be added to define the network? How would the scoring function account for this (water molecules are not included in the gnina CNN training data)?
In the revised manuscript, we have moved the discussion of crystallographic water to the Discussion section, where we also discuss possible future research directions, including for example the use of grand canonical Monte Carlo sampling for prediction of optimal water placement for binding poses generated by FEgrow. Figure S3." Which deviations from zero were obtained?

3) The manuscript states: "The network of alchemical transformations was built manually to include cycle closures for error analysis, and is shown in
Response: Cycle closure errors have been computed from the raw free energy data, and are now reported in Table S6. All cycle closure errors are < 1 kcal/mol.

4) In the methods section it is not mentioned which charge model was used for GAFF2?
Response: The AM1-BCC charge model was used, and this information has now been added to the revised Computational Methods.

5) Also, a 10A cutoff was mentioned for the simulations. Is that a plain cut-off? For electrostatic interactions this is known to produce artifacts. Please comment.
Response: Thank you for pointing out this oversight, we should have also mentioned that electrostatic interactions were computed using the reaction-field method with a tapered cut-off (at 10 Ang), so there is no abrupt cut-off. This is consistent with previous uses of SOMD (https://doi.org/10.1021/acs.jcim.0c00165). We have clarified the treatment of long-ranged interactions in the revised Computational Methods.

Reviewer #2 (Remarks to the Author):
The manuscript of Bieniek et al. concerns the workflow (FEgrow) for building ligand poses in protein binding sites by growing functional groups from a core compound. The workflow is designed to set up input structures for relative binding free energy (RBFE) calculations. Since RBFE calculations based on molecular dynamics (MD) or Monte Carlo (MC) were shown to provide accurate predictions of protein-ligand binding affinity, their use in drug design projects is becoming increasingly common. Set-up of input structures for RBFE calculations is a crucial step because incorrectly prepared input structures can result in either failure of calculations or erroneous binding free energy predictions. Since manual preparation of input structures for a large number of RBFE calculations can be time-consuming, the automated set-up of RBFE calculation input is an important and relevant task. Therefore, I believe that the paper will be of interest for researchers dealing with computer-aided drug design. Since the FEgrow workflow is an open-source project and freely available, I believe that the paper will facilitate new advances in de novo drug design and protein-ligand binding free energy calculations.
The authors demonstrated the applicability and efficiency of their workflow by performing two case studies. In the first case study, FEgrow workflow was applied to 10 targets from the protein-ligand benchmark set to regrow congeneric ligands in the binding pockets. In the second case study FEgrow workflow was applied to generate input structures for RBFE calculations for a set of 13 congeneric inhibitors for the SARS-CoV-2 main protease. Average errors between binding free energies computed by FEgrow and experiment are acceptable for both case studies. RBFE calculations demonstrated sufficient correlation and low error with respect to experiment. The reported results provide convincing evidence of the applicability and efficiency of FEgrow workflow.
The manuscript is clearly written and well organized. The literature on de novo drug design and free energy calculations is summarized appropriately. The workflow design and obtained results are described consistently, coherently and with sufficient detail. I believe that the manuscript should be accepted with revisions.
The following are my comments and critique: Response: Thank you very much for your comments on the manuscript and helpful suggestions for improvement, which we address below.
1) In the introduction, the authors mention several software packages capable of generating analogs of a hit compound by adding chemical functional groups to a rigid ligand core. These packages, therefore, have the same functionality as the workflow presented (FEgrow). The authors do not mention whether E-novo, Frag Explorer, DeepFrag, and Develop packages were used for set-up of input structures for RBFE calculations. If they can be used for this purpose, then it is unclear what are the advantages of FEgrow over these packages.

Response:
We are not aware of specific examples of outputs of these packages being used for free energy calculations, but the generative models in particular certainly could be used for this purpose. However, still today the majority of drug discovery campaigns use suggestions by experienced medicinal chemists to investigate structure-activity relationships around a lead series, rather than generative machine learning models.
Our motivation, therefore, with FEgrow was to replicate the success of the BOMB/MCPRO approach, which uses a molecular builder in conjunction with free energy calculations to help users to design extremely potent inhibitors of, for example, HIV reverse transcriptase. We have implemented these methods in an open-source, easy-to-use package that could be used by medicinal chemists to investigate their own designs. Essentially, we have automated the setup stages of free energy calculations that are currently performed manually.
As a result of our modular design of FEgrow, it was straightforward to then add the option of hybrid machine learning / molecular mechanics potentials for structural optimisation. Machine learning potentials show great promise and are significantly more reliable than, for example, the MM3 and UFF force fields that are used in Fragexplorer and DeepFrag, for example, for structural refinement. Again, these packages could be extended to add these functionalities, but we found it more convenient to write our own.
We have expanded the Discussion section of the revised manuscript to further discuss the philosophy and advantages of FEgrow.